14 research outputs found

    GermEval 2014 Named Entity Recognition Shared Task: Companion Paper

    Get PDF
    This paper describes the GermEval 2014 Named Entity Recognition (NER) Shared Task workshop at KONVENS. It provides background information on the motivation of this task, the data-set, the evaluation method, and an overview of the participating systems, followed by a discussion of their results. In contrast to previous NER tasks, the GermEval 2014 edition uses an extended tagset to account for derivatives of names and tokens that contain name parts. Further, nested named entities had to be predicted, i.e. names that contain other names. The eleven participating teams employed a wide range of techniques in their systems. The most successful systems used state-of-the- art machine learning methods, combined with some knowledge-based features in hybrid systems

    GermEval 2014 Named Entity Recognition Shared Task: Companion Paper

    Get PDF
    This paper describes the GermEval 2014 Named Entity Recognition (NER) Shared Task workshop at KONVENS. It provides background information on the motivation of this task, the data-set, the evaluation method, and an overview of the participating systems, followed by a discussion of their results. In contrast to previous NER tasks, the GermEval 2014 edition uses an extended tagset to account for derivatives of names and tokens that contain name parts. Further, nested named entities had to be predicted, i.e. names that contain other names. The eleven participating teams employed a wide range of techniques in their systems. The most successful systems used state-of-the- art machine learning methods, combined with some knowledge-based features in hybrid systems

    Network of the Day: Aggregating and Visualizing Entity Networks from Online Sources

    Get PDF
    This software demonstration paper presents a project on the interactive visualization of social media data. The data presentation fuses German Twitter data and a social relation network extracted from German online news. Such fusion allows for comparative analysis of the two types of media. Our system will additionally enable users to explore relationships between named entities, and to investigate events as they develop over time. Cooperative tagging of relationships is enabled through the active involvement of users. The system is available online for a broad user audience

    SemRelData – Multilingual Contextual Annotation of Semantic Relations between Nominals: Dataset and Guidelines

    No full text
    Semantic relations play an important role in linguistic knowledge representation. Although their role is relevant in the context of written text, there is no approach or dataset that makes use of contextuality of classic semantic relations beyond the boundary of one sentence. We present the SemRelData dataset that contains annotations of semantic relations between nominals in the context of one paragraph. To be able to analyse the universality of this context notion, the annotation was performed on a multi-lingual and multi-genre corpus. To evaluate the dataset, it is compared to large, manually created knowledge resources in the respective languages. The comparison shows that knowledge bases not only have coverage gaps; they also do not account for semantic relations that are manifested in particular contexts only, yet still play an important role for text cohesion

    SemRelData – Multilingual Contextual Annotation of Semantic Relations between Nominals: Dataset and Guidelines

    No full text
    Semantic relations play an important role in linguistic knowledge representation. Although their role is relevant in the context of written text, there is no approach or dataset that makes use of contextuality of classic semantic relations beyond the boundary of one sentence. We present the SemRelData dataset that contains annotations of semantic relations between nominals in the context of one paragraph. To be able to analyse the universality of this context notion, the annotation was performed on a multi-lingual and multi-genre corpus. To evaluate the dataset, it is compared to large, manually created knowledge resources in the respective languages. The comparison shows that knowledge bases not only have coverage gaps; they also do not account for semantic relations that are manifested in particular contexts only, yet still play an important role for text cohesion

    GermEval 2015: LexSub -- A Shared Task for German-language Lexical Substitution

    No full text
    Lexical substitution is a task in which participants are given a word in a short context and asked to provide a list of synonyms appropriate for that context. This paper describes GermEval 2015: LexSub, the first shared task for automated lexical substitution on German-language text. We describe the motivation for this task, the evaluation methods, and the manually annotated data set used to train and test the participating systems. Finally, we present an overview and discussion of the participating systems' methodologies, resources, and results

    MDSWriter: Annotation tool for creating high-quality multi-document summarization corpora

    No full text
    In this paper, we present MDSWriter, a novel open-source annotation tool for creating multi-document summarization corpora. A major innovation of our tool is that we divide the complex summarization task into multiple steps which enables us to efficiently guide the annotators, to store all their intermediate results, and to record user-system interaction data. This allows for evaluating the individual components of a complex summarization system and learning from the human writing process. MDSWriter is highly flexible and can be adapted to various other tasks

    DBS Corpus

    No full text
    The DBS corpus contains 93 multi-document summaries for 293 German documents about 30 education-related topics. We sampled the topics from the Deutscher Bildungsserver (DBS) webpage and crawled the documents linked there. The documents are highly heterogeneous in terms of text type, genre, and style. The multi-document summaries are the result of a seven step annotation process yielding coherent extracts – a novel type of summary that is based on phrases extracted from the original documents that have been ordered and minimally redacted to form a well-readable, coherent text. The data of all intermediate steps is part of the repository to allow for extensive system evaluation. If you use the corpus in academic works, please cite our COLING paper.2.

    MDSWriter

    No full text
    MDSWriter is a software for manually creating multi-document summarization corpora and a platform for developing complex annotation tasks spanning multiple steps. If you use or build upon MDSWriter, please cite our ACL demo paper.1.
    corecore